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Abstract 

Background: Glioblastoma is a complex multifactorial disorder that has swift and devastating consequences. Few 
genes have been consistently identified as prognostic biomarkers of glioblastoma survival. The goal of this study 
was to identify general and clinical-dependent biomarker genes and biological processes of three complementary 
events: lifetime, overall and progression-free glioblastoma survival. 

Methods: A novel analytical strategy was developed to identify general associations between the biomarkers and 
glioblastoma, and associations that depend on cohort groups, such as race, gender, and therapy. Gene network 
inference, cross-validation and functional analyses further supported the identified biomarkers. 

Results: A total of 61, 47 and 60 gene expression profiles were significantly associated with lifetime, overall, and 
progression-free survival, respectively. The vast majority of these genes have been previously reported to be 
associated with glioblastoma (35, 24, and 35 genes, respectively) or with other cancers (10, 19, and 15 genes, 
respectively) and the rest (16, 4, and 10 genes, respectively) are novel associations. Pik3r1, E2f3, Akrlc3, Csfl, Jag2, 
Plcgl, Rpl37a, Sod2, Topors, Hras, Mdm2, Camk2g, Fstll, Il13ra1, Mtap and Tp53 were associated with multiple survival 
events. 

Most genes (from 90 to 96%) were associated with survival in a general or cohort-independent manner and thus 
the same trend is observed across all clinical levels studied. The most extreme associations between profiles and 
survival were observed for Syne], Pdcd4, Ighgl, Tgfa, Pla2g7, and Paics. Several genes were found to have a cohort- 
dependent association with survival and these associations are the basis for individualized prognostic and gene- 
based therapies. C2, Egfr, Prkcb, Igf2bp3, and GdfW had gender-dependent associations; SoxW, Rps20, Rab31, and 
Vav3 had race-dependent associations; Chi3ll, Prkcb, Polr2d, and Apool had therapy-dependent associations. 
Biological processes associated glioblastoma survival included morphogenesis, cell cycle, aging, response to stimuli, 
and programmed cell death. 

Conclusions: Known biomarkers of glioblastoma survival were confirmed, and new general and clinical-dependent 
gene profiles were uncovered. The comparison of biomarkers across glioblastoma phases and functional analyses 
offered insights into the role of genes. These findings support the development of more accurate and personalized 
prognostic tools and gene-based therapies that improve the survival and quality of life of individuals afflicted by 
glioblastoma multiforme. 
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Background 

Glioblastoma multiforme (glioblastoma, World Health 
Organization grade IV astrocytoma) accounts for 15%- 
20% of all intracranial tumors and 50% of all brain 
malignancies [1]. This aggressive malignant type of pri- 
mary brain tumor has swift and devastating conse- 
quences resulting in a median survival after diagnosis of 
one year [2,3,2]. Primary glioblastoma has a higher inci- 
dence in Caucasian men than in other racial and gender 
groups [4] although these differences may be con- 
founded with differences in access to health care or 
diagnostic practices [5]. Also, the variation in response 
to glioblastoma therapies and similar median survival 
across therapies has prevented the identification of a 
therapy or therapies directly associated with glioblas- 
toma survival [6-9]. 

Numerous studies have proposed biomarker genes 
that can be used to accurately predict the clinical course 
of glioblastoma [10-16]. Although some genes have been 
associated with the presence of glioblastoma, few have 
been identified as prognostic biomarkers of glioblastoma 
survival and fewer have been confirmed in independent 
reports. The limited reproducibility of gene-glioblastoma 
associations may be, in part, due to limited or no con- 
sideration of the clinical characteristics of the indivi- 
duals studied, such as gender and therapy subject 
[17-19]. Another reason for the lack of confirmation of 
biomarker genes of glioblastoma may be the considera- 
tion of the association between glioblastoma and indivi- 
dual genes independently, although multiple genes 
acting in unison are known to influence this disease. 
Statistical reasons for this lack of confirmation include 
the analysis of gene expression levels in glioblastoma 
versus non-glioblastoma samples instead of analyzing 
survival, and the failure to correctly model the censored 
nature of the observations that may not exhibit the pro- 
gression or death event by the end of the period consid- 
ered. For example, The Cancer Genome Atlas Research 
Network (TCGA [20]) identified gene expression aberra- 
tions among the 206 glioblastoma cases considered but 
did not consider the age at glioblastoma death or pro- 
gression, nor the clinical characteristics of the indivi- 
duals studied. 

The goal of this study was to identify general and clin- 
ical-dependent biomarker genes and biological processes 
of three complementary events: lifetime, overall and pro- 
gression-free glioblastoma survival. A novel analytical 
strategy was developed to identify general and cohort- 
dependent associations between the biomarkers and the 
three glioblastoma events. Cross-validation and func- 
tional analysis further supported the identified biomar- 
kers. The identification of gene biomarkers of 
glioblastoma survival supports the efficient follow-up 
studies using in vitro and in vivo experiments and 



augments the molecular toolbox that can be used to 
classify patients across and within cohort groups with 
respect to prognosis and the development of targeted 
treatments. 

Methods 

Data 

Clinical and gene expression information from 320 indi- 
viduals diagnosed with glioblastoma was obtained from 
the TCGA repository (September 2009 data freeze [21]. 
Protocols for specimen preparation and gene expression 
measurements are described in detail in the report by 
The Cancer Genome Atlas Research Network [20]. 
Briefly, a retrospective search in glioblastoma sample 
banks identified newly diagnosed glioblastoma cases 
based on surgical pathology reports and clinical records. 
Only samples that had demographic, clinical and patho- 
logical information, a minimum of 80% tumor nuclei, 
and a maximum of 50% necrosis, qualified for gene 
expression analyses. 

The data included glioblastoma diagnostic and death 
records between the years 1989 and 2009. Clinical fac- 
tors used to classify individuals into cohort groups were 
Gender (Male or Female), Race (White Caucasian or 
Other), Therapy received (R = radiation alone; CRnoT = 
chemo, radiation and not targeted therapy plus other 
therapy if present; CRT = chemo plus radiation and tar- 
geted therapy only; Other = any other combination of 
radiation, chemo, targeted, immune and hormonal ther- 
apy; or None = no therapy), and detection of glioblas- 
toma progression or recurrence (ProgRec - Yes/No). 

Three glioblastoma time-to-event variables were con- 
sidered: lifetime survival (encompassing the period from 
birth to death), overall or post-diagnosis survival 
(encompassing the period from glioblastoma diagnosis 
to death) and post-diagnosis progression-free survival 
(encompassing the period from glioblastoma diagnosis 
to progression of glioblastoma or to recurrence of glio- 
blastoma). The distribution of the observations among 
the levels of the clinical or cohort variables is presented 
in Table 1. There were 287 individuals with sufficient 
survival information for analysis. Three individuals were 
excluded from the progression-free survival analysis 
because of inconsistency in the dates for diagnosis and 
progression or recurrence. 

Gene expression measurements were obtained using 
the Affymetrix HT HG-U133A platform, comprising 
22,277 probe sets. The gene expression measurements 
were obtained in ten experimental batches, in which the 
percentage of individuals per batch ranged from 4.35% 
to 21.25%. For samples with multiple gene expression 
measurements, the correlation between measurements 
across microarrays was higher than 0.98 and, thus, the 
average expression was used to represent the sample. 



Serao et al. BMC Medical Genomics 201 1, 4:49 
http://www.biomedcentral.eom/1755-8794/4/49 



Page 3 of 21 



Table 1 Median length of the hazard period and (relative) frequency of the individuals across clinical cohort levels 



Lifetime survival (n = 287) 1 Overall survival (n = 287) 2 Progression-free survival (n = 284) 2 
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1 expressed in years; 



2 expressed in months. 



Raw expression data was log2 transformed and normal- 
ized using quantile normalization and GC-RMA [22] 
approaches implemented in Beehive [23]. 

In addition to detecting genes in the microarray plat- 
form associated with the glioblastoma survival, particu- 
lar attention was given to genes known to be associated 
with glioblastoma and the association detected in this 
study. A list, including 123 genes known to be asso- 
ciated with glioblastoma were identified from the litera- 
ture [20,24-27] and 51 genes in the KEGG glioma 
pathway [28], was compiled (see Additional file 1). 

Statistical Analysis 

A five-step approach was used to reduce the dimension- 
ality of the data set caused by the large number of 
probes and few records of the individuals in this experi- 
ment. First, a Cox proportional hazards survival analysis 
[29] was undertaken for each non-control probe in the 
microarray platform. The model included all the clinical 
variables with the profile of only one probe. This step 
allowed the selection of probes associated with each of 
the three survival variables at P-value < 0.01. This mild 
threshold was used to minimize the chances of false 
negative associations and evaluate in subsequent steps 
probes with strong or moderate associations with glio- 
blastoma per se. The number of probes identified for 
lifetime survival, overall, and progression-free survival 
was 963, 839, and 1048 respectively. Second, for each 
one of the three glioblastoma time-to-event variables, 
the clinical variables and all remaining probes identified 
in the first step were included simultaneously in a Cox 



survival model. In this manner, the clinical variables 
were fixed component of the model and the probes 
associated with the survival variables were selected using 
a combination of forward and stepwise model selection 
methods. The forward selection method was used to 
add one probe at a time to the model containing the 
clinical variables using a significance level for entry of 
30%. In the complementary stepwise selection method, 
the inclusion of probes followed the same rules as in the 
forward method but a probe only remained in the 
model if its P-value was lower than the significance 
levels for stay of 10%. Since these two selection methods 
could result in different models, a second stepwise selec- 
tion was performed using the significant probes. This 
step allowed identifying broad or general associations 
between probe profiles and glioblastoma survival. Third, 
the interaction between the remaining probes and clini- 
cal variables was evaluated using the stepwise approach. 
This step permitted the detection of clinical or cohort- 
dependent associations between probe profiles and glio- 
blastoma survival. The fourth stage of our approach 
aimed to select the significant probes from our list of 
174 known genes associated with glioblastoma (see 
Additional file 1) fitting the probes and interaction with 
the clinical variables using the stepwise selection 
method. The consideration of the known probes alone 
aimed at minimizing the potential masking of associa- 
tions by other probes in the model. Similarly to the pre- 
vious step, in the final step the probes identified from 
both sets of analyses were combined and further stream- 
lined using the stepwise method. This final step allowed 



Serao et al. BMC Medical Genomics 201 1, 4:49 
http://www.biomedcentral.eom/1755-8794/4/49 



Page 4 of 21 



the confirmation of prior probes associations reported in 
previous studies as well as the identification of novel 
associations. With respect to P-value threshold selection 
at each stage, a lenient first-stage threshold was used to 
capture most true positive associations at the expense of 
some false positives. The more stringent threshold used 
in the subsequent steps and repeated selection process 
minimized the number of false positives remaining in the 
index. Thus, this approach would have the same effect 
than reducing the threshold in the first step with the 
added benefit of minimizing the loss of true positives. 
Likewise, extending the first- stage threshold would have 
resulted in more false positives being considered in the 
second stage and higher risks of overparameterization. 

In addition to a P-value, each probe had a hazard ratio 
(HR) estimate and associated 95% confidence interval 
limits. Hazard ratios below 1 indicate that the hazard 
under consideration decreases as the level of the gene 
increases. The proportional hazards assumption was 
assessed for the final predictive models corresponding to 
each survival variable based on the residuals. There was 
no evidence of departure from the assumptions for all 
the models reported. The association between survival 
and clinical and probe expression profiles was visualized 
by plotting the probability of survival predicted by the 
Cox model against time. For depiction purposes, indivi- 
duals were divided into low and high probe expression 
groups that corresponded to the 25 th and 75 th percentile 
respectively given the median expression for all other 
probes in the predictive model. The survival curves were 
computed based on the information used to identify the 
significant gene associations. Biomarker genes resulting 
from the multi-stage approach were compared to pre- 
vious reports of genes associated with glioblastoma or 
other cancers. The protein interaction resource at the 
NCBI Gene data base [30] was used to check that bio- 
markers not previously associated with cancer were also 
not indirectly associated with cancer through intermedi- 
ate genes. 

The genes identified by the five-step approach were 
compared to those resulting from a more conventional 
analysis using a one-step Cox survival analysis with a 
stringent cut-off (P-value < 0.0001). 

Functional and Gene Network Analyses 

Identification of Gene Ontology (GO) categories (mole- 
cular function and biological process) and KEGG path- 
ways represented among the significant genes associated 
with each glioblastoma survival variable was undertaken 
[31,32]. The representation of genes in the GO and 
KEGG pathway classes was evaluated using Fisher's 
exact (two-tailed) test and False Discovery Rate multiple 
test adjustment [33]. The relationships between the bio- 
marker genes were further studied for the three 



glioblastoma survival variables and significant functional 
categories. The BisoGenet plug-in [34] from the Cytos- 
cape software [35] was used to build and visualize the 
networks for each one of the three glioblastoma survival 
variables using the respective list of significant genes 
from the GO categories. All the available data sources 
in BisoGenet (including BIOGRID, DIP, BIND and 
others) were selected to generate the interactions. To 
facilitate the visualization of the networks, only interac- 
tions (edges) connecting two significant genes (nodes) 
directly or through an intermediate gene were depicted. 

Cross-validation 

The associations between gene profiles and survival 
detected in this study were confirmed using a three-fold 
approach. First, a leave-one-out cross-validation 
(LOOCV) approach [36-38]. LOOCV is specially recom- 
mended in data sets of limited size, providing an almost 
unbiased estimator and identifying the same best classi- 
fiers as other X-fold training-test data partitions [38,39]. 
Validation of the predictive survival equation and bio- 
markers detected in a training data set on an indepen- 
dent test data set is desirable, followed by X-fold cross- 
validation on a particular data set. The representation of 
all cohort factors on both the training and test sets is 
necessary for unbiased evaluation of the biomarkers and 
to ensure that the detected biomarkers were not a spur- 
ious artifact of ignored cohort effects and for a fair eva- 
luation of the training estimates. Consideration of race 
is particular critical for the validation of biomarkers 
detected in this study because lack of adjustment for 
this cohort factor could result in the identification of 
associations that are due to genetic background and not 
the particular gene expression profile. 

For the X-fold validation approach, the specification of 
suitable training and testing data sets would have 
required at least 200 patients in each data set (5 indivi- 
duals x 2 races x 2 genders x 5 therapies x 2 recurrence 
groups) and only 287individuals were available. The 
minimum of 5 individuals per group minimizes the risk 
of confounding between individual variation and cohort 
variation. Use of smaller data sets would have lead to 
low power and biased findings because of the ill-repre- 
sentation of individuals across cohort groups. Thus, the 
X-fold cross-validation could not be implemented. Like- 
wise, the test of the predictive hazard equations (that 
include cohort factors) on an independent data set 
could not be implemented due to the lack of dataset 
with comparable cohort information or adequate struc- 
ture that would minimize the risk of confounding 
between factors. 

Accurate validation of associations between biomar- 
kers and survival was attained using LOOCV discrimi- 
nant analysis[40] that allows the assessment of the 
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performance of biomarkers to classify individuals into 
high and low hazard (low and high survival, respec- 
tively). The same cohort information was used to obtain 
parameter estimates and to train the predictive hazard 
equations. For each survival variable, the median length 
of the period considered (age at death for lifetime survi- 
val; months from diagnosis to death for overall survival 
and; months from diagnosis to progression/recurrence 
for progression free survival) was calculated, and indivi- 
duals were classified into either a high or low hazard 
group based on the median. The 20% of the individuals 
that had a length of period closest (higher or lower) to 
the median were not considered in order to minimize 
borderline cases that could affect the assessment of the 
model performance. Only non-censored records were 
used in the cross-validation analysis to favor unbiased 
classification. All individuals except for one were used 
to develop a new hazard index using the biomarkers 
previously detected and the new index was used to clas- 
sify the remaining individuals. This leave-one-individual- 
out analysis was repeated for all individuals and the 
observed and predicted affiliations of the individuals to 
the high and low groups for each hazard were compared 
in order to assess the correct assignment rate. 

Second, in addition to LOOCV, confirmation of the 
genes associated with the three glioblastoma hazards 
was investigated on the independent database 
REMBRANDT (REpository for Molecular BRAin Neo- 
plasia DaTa) [41,42]. This database includes gene 
expression and survival information on 181 individuals 
diagnosed with glioblastoma. Third, a literature review 
was undertaken to identify independent studies that 
have reported associations between the genes associated 
with survival detected in this study and glioblastoma or 
other cancer types. 

Results 

Confirmed and Novel Biomarkers of Glioblastoma 

The median length of the periods associated with life- 
time, overall, and progression-free survival across and 
within clinical or cohort group are presented in Table 1. 
The age of the individuals at death or at the end of the 
considered period ranged from 14 to 87 years with a 
median age of 60 years. The median survival length was 
59 years, 13 months and 7 months for lifetime, overall, 
and progression-free survival, respectively. 

A total of 168 significant associations between 
expression profiles and glioblastoma survival (61, 47 
and 60 associations for lifetime, overall, and progres- 
sion-free survival, respectively) from 139 genes were 
identified. Among these, 10 associations are borderline 
significant (0.1 < P-value < 0.05) and are included in 
the tables in support of other more significant 
associations. 



The vast majority of the genes associated with glio- 
blastoma survival have been previously reported to be 
associated with glioblastoma (35, 24, and 35 genes, 
respectively) or with another cancer (10, 19, and 15 
genes, respectively) and the rest (16, 4, and 10 genes, 
respectively) exhibited novel associations with glioblas- 
toma. Table 2 presents the distribution of genes and 
probes associated with more than one hazard. Cohort- 
independent and cohort-dependent associations, respec- 
tively, were uncovered for lifetime (Tables 3 and 4), 
overall (Tables 5 and 6), and progression-free (Tables 7 
and 8) survival. Cohort-independent associations 



Table 2 Genes and probes represented more than one 
time within or across the glioblastoma survival events 



Gene 


Lifetime 


Overall 


Progression-free 




Survival 


Survival 


survival 


Actr2 




200727_s_at 








200729_s_at 




Akr1c3 


209160_at 


209160_at 




App 






211277_x_at 








214953_s_at 


Camk2b 






211483_x_at 








209956_s_at 


Camk2g 


212757_s_at 




214322_at 




214322_at 






Cdc42 


208727_s_at 








208728_s_at 








214230_at 






Chi3l1 


216546_s_at 








209396_s_at 






Csfi 


207082_at 


20971 6_at 




E2f3 


203692_s_at 


203693_s_at 


203693_s_at 


Egfr 


211551_at 








211607_x_at 






Fall 


208782_at 




208782_at 


Hras 




212983_at 


212983_at 


Ighgl 




211908_x_at 








211693_at 




M3ra1 


210904_s_at 




21 161 2_s_at 


Jag2 


32137_at 


209784_s_at 




Mdm2 




217373_x_at 


217373_x_at 


Mtap 


204956_at 




204956_at 


Pik3r1 


212240_s_at 


212249_at 


212239_at 


Pkgl 


216551_x_at 


216551_x_at 




Prkcb 


207957_s_at 








209685_s_at 






Rpl37a 


213459_at 


213459_at 




Sod2 


221477_s_at 


215078_at 




Timp3 






20l148_s_at 








20l150_s_at 


Topors 


204071 _s_at 


204071 _s_at 




Tp53 


2 1 1 300_s_at 




21 1300_s_at 
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Table 3 Genes that have a general association (P-value < 0.05) with the lifetime glioblastoma survival 



Gene Symbol 


Probe Identifier 


P-value 


Hazard Ratio 1 


Relevant literature references 


Synel 


209447_at 


<.0001 


0.17 (0.10-0.32) 


[60]° 


E2f3 


203692_s_at 


<.0001 


0.26 (0.15-0.44) 


[28] G 


FstH 


208782_at 


<.0001 


0.31 (0.22-0.42) 


[25] G 


Ep300 


213579_s_at 


<.0001 


0.34 (0.29-0.57) 


[26] G 


Gigyf2 


212261_at 


<.0001 


0.39 (0.26-0.58) 


n/a 


Topors 


20407 1_s_at 


<.0001 


0.41 (0.29-0.59) 


[26] G 


Chst4 


220446_s_at 


0.0989 2 


0.44 (0.17-1.16) 


[89]° 


Sari a 


201543_s_at 


<.0001 


0.44 (0.29-0.66) 


n/a 


Il13ra1 


210904_s_at 


<.0001 


0.47 (0.36-0.60) 


[24] G 


Sod2 


221477_s_at 


<.0001 


0.47 (0.37-0.59) 


[25] G 


Rab15 


221810_at 


<.0001 


0.48 (0.34-0.69) 


n/a 


Timm23 


21 81 1 8_s_at 


0.0239 


0.50 (0.27-0.91) 


n/a 


Kcnj4 


208359_s_at 


<.0001 


0.50 (0.38-0.66) 


n/a 


Rpl37a 


213459_at 


0.0023 


0.51 (0.33-0.79) 


[90] G 


Camk2g 


214322_at 


0.0135 


0.53 (0.32-0.88) 


[56] G 


Plcgl 


216551_x_at 


0.0068 


0.55 (0.35-0.85) 


[26] G 


Slc43a3 


21311 3_s_at 


0.0004 


0.56 (0.40-0.77) 


n/a 


Cdc42 


208727_s_at 


<.0001 


0.57 (0.45-0.73) 


[26] G 


Csfl 


207082_at 


0.0092 


0.58 (0.38-0.88) 


[26] G 


Ccnb2 


202705_at 


0.0118 


0.60 (0.40-0.89) 


[91 ] G 


Tlk2 


212997_s_at 


0.0004 


0.64 (0.49-0.82) 


n/a 


Mtap 


204956_at 


0.0091 


0.67 (0.49-0.91) 


[26] G 


Egfr 


211551_at 


0.0743 2 


0.68 (0.45-1.04) 


[24] G 


Akt2 


211453_s_at 


0.0292 


0.68 (0.48-0.96) 


[86] G 


Akr1c3 


209160_at 


<.0001 


0.70 (0.62-0.81) 


[26] G 


Tp53 


211300_s_at 


0.0215 


0.76 (0.60-0.96) 


[25] G 


Igf1 


209541 _at 


0.0183 


0.76 (0.61-0.95) 


[26] G 


RplW 


221989_at 


0.0392 


0.80 (0.64-0.99) 


[24] G 


Arhgef4 


205109_s_at 


0.0647 2 


0.80 (0.64-1.01) 


n/a 


Cdc42 


214230_at 


0.0554 2 


0.82 (0.67-1.00) 


[26] G 


CN3I1 


216546_s_at 


0.061 


0.87 (0.75-1.00) 


[70] G 


Ppbp 


214146_s_at 


0.012 


1.16 (1.03-1.30) 


n/a 


Cdkn2a 


209644_x_at 


0.0003 


1.18 (1.08-1.29) 


[92] G 


Wdr67 


214061_at 


0.0237 


1.30 (1.03-1.63) 


[93]° 


TspylS 


213122_at 


0.0003 


1.34 (1.14-1.56) 


n/a 


Usf2 


215737_x_at 


<.0001 


1.42 (1.19-1.69) 


[94]° 


Camk2g 


212757_s_at 


0.0078 


1.54 (1.12-2.13) 


[56] G 


Pik3r1 


212240_s_at 


0.0022 


1.67 (1.20-2.32) 


[20] G 


Aktl 


207163_s_at 


0.0005 


1.70 (1.26-2.30) 


[84]° 


Rac2 


213603_s_at 


0.0001 


1.74 (1.31-2.31) 


[95] G 


Six6 


207250_at 


<.0001 


1.82 (1.45-2.28) 


[96]° 


Spg21 


217827_s_at 


0.0387 


1.91 (1.03-3.52) 


n/a 


Wdyhvl 


219060_at 


0.0015 


1.95 (1.29-2.94) 


n/a 


Uros 


20303 1_s_at 


0.0067 


2.37 (1.27-4.42) 


n/a 


Lin7c 


219399_at 


0.0002 


2.40 (1.51-3.80) 


[97]° 


Ros1 


207569_at 


<.0001 


2.58 (1.73-3.85) 


[98]° 


Cdk2 


204252_at 


<.0001 


2.74 (1.78-4.21) 


[91 ] G 


Jag2 


321 37_at 


<.0001 


2.78 (1.86-4.14) 


[24] G 


Kiaa0090 


212395_s_at 


<.0001 


2.89 (1.87-4.47) 


n/a 


Ccnbl 


21471 0_s_at 


<.0001 


3.16 (2.00-4.98) 


[91 ] G 


ScnSa 


20741 3_s_at 


<.0001 


3.21 (1.79-5.74) 


n/a 
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Table 3 Genes that have a general association (P-value < 0.05) with the lifetime glioblastoma survival (Continued) 



Coll 4a 1 


212865_s_at 


<.0001 


3.30 (1.93-5.63) 


[99]° 


HoxaW 


213147_at 


<.0001 


3.30 (1.93-5.65) 


[100]° 


Cdc42 


208728_s_at 


<.0001 


3.94 (2.12-7.32) 


[26] G 


Pdcd4 


20273 1_at 


<.0001 


4.68 (3.01-7.28) 


[26] G 



n/a, No association with any type of cancer found in literature; 

1 Hazard ratio estimate (95% confidence interval); 

2 Borderline significant (P-value < 0.1) included for completeness; 

G Gene confirmed in an independent glioblastoma multiforme study; the number between square brackets denotes the corresponding literature reference; 
Gene confirmed in an independent study on any other type of cancer; the number between square brackets denotes the corresponding literature reference. 



represented 90%, 96% and 92% of the significant gene 
associations for lifetime, overall, and progression-free 
survival, respectively. 

The five-step approach was consistently superior to a 
one-step Cox analysis with more stringent P-value < 
0.001 on all three survival indicators. For the three vari- 
ables studied; lifetime, overall, and progression-free survi- 
val, the simpler approach identified 60, 71 and 67 probes 
of which 19, 17, and 23 respectively overlapped with the 
corresponding 61, 47, and 60 probes identified in the 
five-step analyses of the three survival indicators. Of the 
139 probes identified by the simpler approach and not 
identified by our approach, the vast majority (123 probes 
across all three variables) have not been associated with 
glioblastoma and could not be confirmed. 



Genes Associated with Lifetime Death Hazard 

Sixty-one gene profiles, representing 55 genes, were 
associated with lifetime survival. An increase in the level 
of expression of 31 genes was associated with a decrease 
in HR, with estimates ranging from 0.17 (Synel) to 0.87 
(Chi3ll). The changes in survival across levels of gene 
expression and clinical variables for the population 
under consideration were visualized using survival plots. 
The decline on the probability of lifetime survival across 
age (in years) for individuals with high (75 th percentile) 
and low (25 th percentile) levels of Synel is depicted in 
Figure 1. Consistent with the hazard ratio estimate (HR 
= 0.17, P-value < 0.0001), the probability of survival of 
individuals with high levels of Synel remains higher 
across age. Individuals with high and low levels of Synel 



Table 4 Genes that have a cohort-dependent association (P-value < 0.05) with the lifetime glioblastoma survival 

Gene Symbol Probe Identifier Clinical Cohort P-value Level of Clinical Cohort Hazard Ratio 1 Relevant literature references 



Prkcb 2 


207957_s_at 


Gender 


<.0001 


Male 


0.36 


(0.24-0.55) 


[28] G 










Female 


1.27 


(0.84-1 .93) 








Therapy 


0.0006 


None 


0.38 


(0.25-0.60) 












CRnoT 


0.51 


(0.36-0.73) 












R 


0.64 


(0.46-0.88) 












CRT 


0.71 


(0.44-1.13) 












Other 


0.75 


(0.43-1.32) 




Sox/0 


209843_s_at 


Race 


0.0018 


White 


0.55 


(0.44-0.68) 


[10] G 










Other 


1.08 


(0.72-1 .62) 




Egfr 


211607_x_at 


Gender 


<.0001 


Male 


0.60 


(0.50-0.72) 


[24] G 










Female 


0.88 


(0.74-1 .04) 




Chi3!l 


209396_s_at 


Therapy 


0.0006 


CRT 


1.27 


(0.96-1.70) 


[70] G 










R 


1.28 


(1.07-1.52) 












Other 


1.31 


(1.04-1.66) 












CRnoT 


1.53 


(1.31-1.79) 












None 


2.42 


(1.56-3.75) 




C2 


203052_at 


Gender 


0.0033 


Female 


1.30 


(1.03-1.65) 


n/a 










Male 


1.93 


(1.56-2.39) 




Prkcb 


209685_s_at 


Gender 


<.0001 


Female 


1.31 


(0.79-2.14) 


[28] G 










Male 


5.21 


(3.16-8.61) 





n/a, No association with any type of cancer found in literature; 

1 Hazard ratio estimate (95% confidence interval); 

2 Interaction with a single clinical cohort factor; 

G Gene confirmed in an independent glioblastoma multiforme study; the number between square brackets denotes the corresponding literature reference; 
° Gene confirmed in an independent study on any other type of cancer; the number between square brackets denotes the corresponding literature reference. 
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Table 5 Genes that have a general association (P-value < 0.05) with the overall glioblastoma survival 



Gene Symbol 


Probe Identifier 


P-value 


Hazard Ratio 1 


Relevant literature references 


Tgfa 


20501 5_s_at 


0.0002 


0.12 (0.04-0.37) 


[28] G 


Sirpo 


202895 s at 


<.0001 


0.24 (0.14-0.41) 


[26] G 


Ctbp2 


210835 s at 


<.0001 


0.28 (0.16-0.48) 


[101]° 


Eeflel 


213907 at 


<.0001 


0.37 (0.23-0.61) 


[102]° 


Mapk3 


212046_x_at 


0.0041 


0.43 (0.24-0.76) 


[103]° 


Actr2 


200727_s_at 


<.0001 


0.43 (0.29-0.63) 


[24] G 


lgh@ 


21 1637_x_at 


0.0167 


0.44 (0.23-0.86) 


n/a 


PIcgl 


216551 x at 


<.0001 


0.46 (0.31-0.68) 


[261 G 


Mgat3 


209764_at 


<.0001 


0.51 (0.37-0.71) 


[26] G 


LrpIO 


20141 2_at 


0.0041 


0.60 (0.42-0.85) 


[24] G 


Id hi 


201 193_at 


0.0051 


0.60 (0.42-0.86) 


[24] G 


Tmem8b 


207839 s at 


<.0001 


0.60 (0.46-0.77) 


n/a 


Ccno2 


203418 at 


<.0001 


0.60 (0.49-0.75) 


[104]° 


Topofs 


204071 s at 


0.0007 


0.61 (0.46-0.81) 


[26] G 


Rpl37a 


213459 at 


0.01 64 


0.66 (0.47-0.93) 


[90] G 


Mdm2 


217373 x at 


<.0001 


0.69 (0.61-0.78) 


[261 G 


E2f3 


203693 s at 


0.0672 2 


0.75 (0.55-1.02) 


[28] G 


Mdfic 


21 1675 s at 


0.0006 


0.78 (0.68-0.90) 


[105]° 


Sod2 


215078 at 


<0001 


0.80 (0.73-0.88) 


[25] G 


Akric3 


209160 at 


0.0014 


0.83 (0.73-0.93) 


[26] G 


Thbs4 


204776_at 


0.0007 


1.18 (1.07-1.30) 


[106]° 


Shc3 


206330_s_at 


0.0031 


1.32 (1.10-1.59) 


[28] G 


Pik3rl 


212249_at 


0.0145 


1.34 (1.06-1.69) 


[20] G 


Nkx2-5 


206578 at 


0.0027 


1.38 (1.12-1.70) 


[107]° 


Hras 


212983_at 


0.0187 


1.42 (1.06-1.90) 


[85] G 


Bhlhb9 


213709_at 


0.0192 


1.42 (1.06-1.92) 


[108]° 


C9orf95 


219147_s_at 


0.0004 


1.43 (1.17-1.73) 


[109]° 


CUorflOl 


219254 at 


0.0085 


1.46 (1.10-1.94) 


[110]° 


Nol3 


59625 at 


<.0001 


1 .46 (1 .21-1 .76) 


[111]° 


Rongop 1 


212125 at 


0.0225 


1.47 (1.06-2.05) 


[26] G 


Ftsj2 


222130_s_at 


0.017 


1.48 (1.07-2.05) 


[112]° 


Rrml 


201476 s at 


0.0006 


1.49 (1.19-1.87) 


[27] G 


Jag2 


209784 s at 


0.0351 


1.63 (1.03-2.57) 


[24] G 


Tnpo 7 


212635 at 


0.0054 


1.89 (1.20-2.96) 


n/a 


Myo7a 


21 1 103_at 


0.0033 


1.97 (1.25-3.10) 


[113]° 


Actr2 


200729_s_at 


0.0001 


2.18 (1.47-3.23) 


[24] G 


Csfl 


209716_at 


<.0001 


2.33 (1.65-3.27) 


[26] G 


Ankl 


208352_x_at 


0.0003 


2.38 (1.49-3.82) 


[24] G 


B3galntl 


211379_x_at 


<.0001 


2.40 (1.76-3.28) 


[114]° 


Kras 


214352_s_at 


0.002 


2.44 (1.38-4.31) 


[85] G 


Ewsrl 


210012_s_at 


0.0005 


2.49 (1.49-4.15) 


[26] G 


Sec24c 


202361 _at 


<0001 


2.84 (1.76-4.60) 


n/a 


RpliOl 


217559_at 


<0001 


2.95 (1.83-4.74) 


[115]° 


Ighgl 


211908_x_at 


0.0007 


3.41 (1.68-6.93) 


[116]° 


Ighgl 


211693_at 


0.0007 


4.33 (1.86-10.04) 


[116]° 



n/a, No association with any type of cancer found in literature; 

1 Hazard ratio estimate (95% confidence interval); 

2 Borderline significant (P-value < 0.1) included for completeness; 

G Gene confirmed in an independent glioblastoma multiforme study; the number between square brackets denotes the corresponding literature reference; 
Gene confirmed in an independent study on any other type of cancer; the number between square brackets denotes the corresponding literature reference. 
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Table 6 Genes that have a cohort-dependent association (P-value < 0.05) with the overall glioblastoma survival 



Gene Symbol Probe Identifier Clinical Cohort P-value Level of Clinical Cohort Hazard Ratio 1 Relevant literature references 



Polr2d 


214144_at 


Therapy 


0.0044 


Other 

R 
CRT 
None 
CRnoT 


0.35 (0.18-0.70) 
0.50 (0.31-0.81) 
0.68 (0.38-1.21) 
0.77 (0.43-1.39) 
0.93 (0.58-1.5) 


[117]° 


Igf2bp3 


203820_s_at 


Gender 


0.0146 


Female 
Male 


1.02 (0.84-1.24) 
1.29 (1.12-1.49) 


[118]° 



n/a, No association with any type of cancer found in literature; 
1 Hazard ratio estimate (95% confidence interval); 

G Gene confirmed in an independent glioblastoma multiforme study; the number between square brackets denotes the corresponding literature reference; 
Gene confirmed in an independent study on any other type of cancer; the number between square brackets denotes the corresponding literature reference. 



have a survival probability of 50% at 69 and 52 years of 
age, respectively. The opposite trend was observed in 
the remaining 24 profiles that have hazard ratio esti- 
mates ranging from 1.16 (Ppbp) to 4.7 (Pdcd4). 

Among the genes exhibiting cohort-dependent asso- 
ciations with lifetime survival (Table 4), the drop in the 
probability of lifetime survival across age (in years) for 
females and males with high (75 th percentile) and low 
(25 th percentile) levels of Prkcb 209685_s_at is portrayed 
in Figure 2. Consistent with the hazard ratio estimates 
for females (HR = 1.31) and males (HR = 5.21), the 
probability of survival declines faster in males with high 
levels of Prkcb than females with low levels of this gene. 

Genes Associated with Overall Survival 

A total of 45 genes were associated with overall survival 
(Tables 5 and 6). Among the cohort-independent asso- 
ciations, an increase in the levels of 20 genes was asso- 
ciated with a decrease in overall hazard with HR 
ranging from 0.12 (Tgfa) to 0.83 {Akrlc3). On the other 
hand, an increase in the level of 25 genes was associated 
with an increase in overall hazard with HR ranging from 

1.18 {Thbs4) to 4.33 (Ighgl). Among the cohort-depen- 
dent associations, the hazard increased more in males 
(HR = 1.29) than in females (HR = 1.02) per unit 
increase in the levels of Igf2bp3. 

Genes Associated with Progression-free Survival 

Of the 60 probes (corresponding to 57 genes) associated 
with progression-free survival, 55 had general associa- 
tions and 5 had cohort-dependent associations (Tables 7 
and 8). Among the genes that have cohort-independent 
associations, an increase in the level of 23 genes was 
associated with a decrease in HR, ranging from 0.11 
(Pla2g7) to 0.85 (Cd24). For the remaining 32 genes, an 
increase in the level of expression was associated with 
an increase in the progression-free HR ranging from 

1.19 {Clec2b) to 5.28 (Paics). The decline in the progres- 
sion-free survival probability across time (in months) for 
individuals with high (75 th percentile) and low (25 th 



percentile) levels of neuroblastoma RAS viral (v-RAS) 
oncogene homolog (Nras) is depicted in Figure 3. Con- 
sistent with the hazard ratio estimate (HR = 3.93, P- 
value < 0.0001), the progression-free survival probability 
falls faster in individuals with high expression levels of 
Nras. With regard to the cohort-dependent association 
with progression-free survival, an increase in the expres- 
sion of GdflO was associated with a higher decrease of 
the hazard ratio in males (HR = 0.37) than in females 
(HR = 0.80). 

Gene That Have Multiple Probes and Hazard Ratios 

When multiple probes of the same gene had opposite 
associations with the glioblastoma hazard (e.g. HR > 1 
for probe 1 and HR < 1 for probe 2), the disagreements 
were resolved by assessing the dependability of each 
probe. Information on dissenting probes is briefly sum- 
marized here. Probe 214322_at, of Cam2kg, was 
obtained from an ovary EST and thus is less reliable in 
respect to brain cancer than probe 212757_s_at. Probe 
208728_s_at, of Cdc42, is expected to be more reliable 
than 208727_s_at because the former was obtained from 
an mRNA sequence that has double the length than the 
later. Probe 200729_s_at, of Actr2, corresponds to an 
mRNA and is more reliable than 200727_s_at, which 
corresponds to an EST from mixed tissues. Probe 
210904_s_at, of Ill3ral, corresponds to a cluster of 
mRNA assigned to this gene in the NCBI-Gene data- 
base; meanwhile, probe 211612_s_at pertains to a single 
mRNA not assigned to the gene and is considered less 
reliable. Other probes include 201148_s_at of Timp3, 
which corresponds to a pancreatic EST, 200727_s_at of 
Actr2, which corresponds to a mixed tissue EST, and 
209956_s_at of Camk2b, which corresponds to a proline 
rich sequence. 

Functional and Gene Network Analyses 

The GO categories enriched (FDR adjusted P-value < 
0.1, > 3 genes/category) among the genes associated 
with each of the three glioblastoma survival variables 
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Table 7 Genes that have a general association (P-value < 0.05) with the progression-free glioblastoma survival 
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t\urRuD 


/I 1 QQA i -at 

4 1 3oo_i_ai 


U.UUUd 


1 03 M 30 0 Oi^\ 

z.Uj 1 1 .jy-z.yoj 


n/ a 


yf., 
Aty 


~>C\~7~) A£. -it 

zu/z4o_at 


0.001 6 


o n 3 1 3 
z.Uo (, I . o I -j.zzj 


L128J 


Smarcb 1 


212167_s_at 


0.0004 


2.06 (1.38-3.07) 


[26] G 


Utp20 


209725_at 


<.0001 


2.08 (1.46-2.98) 


n/a 


lgl@ 


211655_at 


0.0209 


2.22 (1.13-4.38) 


[129]° 


Atf5 


204998_s_at 


<.0001 


2.31 (1.72-3.11) 


[130] G 


Shox 


207570_at 


<.0001 


2.66 (1.73-4.07) 


[24] G 


Loc283079 


215929_at 


0.0071 


2.73 (1.31-5.69) 


n/a 


Ung 


202330_s_at 


0.0001 


2.79 (1.66-4.68) 


[27] G 


Hnrnpd 


213359_at 


<.0001 


2.94 (1.91-4.52) 


n/a 
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Table 7 Genes that have a general association (P-value < 0.05) with the progression-free glioblastoma survival 

(Continued) 



Camk2b 


209956_s_at 


<.0001 


3.02 (2.13-4.29) 


[28] G 


Timp3 


201 150_s_at 


<.0001 


3.10 (1.88-5.11) 


[26] G 


Nras 


202647_s_at 


<.0001 


3.93 (2.60-5.95) 


[28] G 


Paics 


214664_at 


<.0001 


5.28 (3.13-8.91) 


[66]° 



n/a, No association with any type of cancer found in literature; 

1 Hazard ratio estimate (95% confidence interval); 

2 Borderline significant (P-value < 0.1) included for completeness; 

G Gene confirmed in an independent glioblastoma multiforme study; the number between square brackets denotes the corresponding literature reference; 



Gene confirmed in an independent study on any other type of cancer; the number between square brackets denotes the corresponding literature reference. 



are summarized in Tables 9, 10, and 11. The functional 
analysis revealed nine, two and ten biological processes 
enriched among the genes associated with lifetime, over- 
all, and progression-free survival respectively, and three 
molecular functions enriched among the genes asso- 
ciated with progression-free survival. The biological pro- 
cesses of cell cycle (GO:0007049) and death 
(GO:0016265) were over-represented among the genes 
associated with the lifetime and progression-free survi- 
vals. The gene networks for the significant genes from 
the functional analyses associated with lifetime, overall, 
and progression-free survival are depicted in Figures 4, 
5 and 6, respectively. 

Cross-validation 

The performance of the gene sets as reliable prognostica- 
tors of the three glioblastoma survival variables was eval- 
uated. The generalization capability of the biomarker 
index was tested in individuals other than those used to 
develop a hazard index using a leave-one-individual-out 
discriminant analysis. Individuals were predicted to 



pertain to the high or low glioblastoma hazard groups for 
each event using the predictive biomarker index, and the 
prediction was compared to the observed classification 
based on the length of the period corresponding to each 
event. For both, lifetime and overall hazard, the number 
of observed high and low individuals was 100, and the 
number of predicted high and low individuals was 97 and 
103, respectively. For the progression-free hazard, the 
number of observed high and low individuals was 87 and 
88, respectively, and the number of predicted high and 
low individuals was 83 and 92, respectively. 

Additional evaluation of the results was pursued by 
comparing the genes associated with the three glioblas- 
toma hazards identified in this study and the target 
genes of microRNAs associated with the glioblastoma 
hazard reported by Delfino et al. [43]. One third of the 
sequences identified in this study are putative targets of 
microRNAs associated with glioblastoma. A hypergeo- 
metric test confirmed that the overlap between the 
genes uncovered in this study and the target genes was 
significant (P-value < 0.005). 



Table 8 Genes that have a cohort-dependent association (P-value < 0.05) with progression-free glioblastoma survival 

Gene Symbol Probe Identifier Clinical Cohort P-value Level of Clinical Cohort Hazard Ratio 1 Relevant literature references 



GdflO 


206159_at 


Gender 


0.0317 


Male 


0.37 (0.23-0.60) 


n/a 










Female 


0.80 (0.45-1.42) 




Vav3 


218807_at 


Race 


0.008 


Other 


0.41 (0.29-0.59) 


[81 ] G 










White 


0.68 (0.55-0.85) 




Rps20 


216246_at 


Race 


0.0003 


Other 


0.75 (0.39-1.44) 


[75]° 










White 


1 .83 (1 .03-3.24) 




Rab31 


217764_s_at 


Race 


C0001 


White 


1.47 (0.93-2.30) 


[74]° 










Other 


7.72 (3.71-16.07) 




Apool 


213289_at 


Therapy 


0.0026 


R 


1.64 (1.13-2.38) 


[131]° 










None 


1 .93 (0.63-5.98) 












CRnoT 


2.23 (1.55-3.20) 












Other 


3.86 (1.83-8.13) 












CRT 


4.82 (2.69-8.63) 





n/a, No association with any type of cancer found in literature; 
1 Hazard ratio estimate (95% confidence interval); 

G Gene confirmed in an independent glioblastoma multiforme study; the number between square brackets denotes the corresponding literature reference 
° Gene confirmed in an independent study on any other type of cancer; the number between square brackets denotes the corresponding literature reference. 
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Figure 1 Probability of lifetime glioblastoma survival across 
age for spectrin repeat containing, nuclear envelope 1 (SyneJ) 

Probability of glioblastoma survival across age for individuals with 
Low (25 th percentile) and High (75 th percentile) expression level of 
spectrin repeat containing, nuclear envelope 1 (Syne/). With a lower 
hazard estimate (HR = 0.17), the probability of survival of individuals 
with high levels of Syne 1 remains higher across age. 



Discussion 

The data set analyzed offered a suitable representation 
of the general population of glioblastoma cases. The 
median overall survival was 13 months, and the prob- 
abilities of survival at 12, 24, 36, 48 and 60 months 
post-diagnosis were 0.59, 0.25, 0.15, 0.11 and 0.07 
respectively, in this study. The median survival is similar 
to that reported by Krex et al. [14], and the 60 month 
survival probability is comparable to the 5 -year survival 
rate of 0.13 estimated for grade IV brain cancer reported 
by the National Cancer Institute Surveillance Epidemiol- 
ogy and End Results [44]. The similarity between the 
survival rate in this study and that reported for primary 
glioblastoma suggests an insignificant fraction of sec- 
ondary glioblastoma samples among the samples ana- 
lyzed [20]. 

Comparing findings against a literature review con- 
firmed that the Cox survival analysis of multiple gene 
expression profiles and clinical variables simultaneously 
was an effective tool to detect an integrated set of gene 
expression profiles exhibiting general and cohort-depen- 
dent associations with the three glioblastoma survival 
variables. The majority of the genes associated with life- 
time, overall, and progression-free survival, in this study, 
have been previously reported to be associated with glio- 
blastoma (35, 24, and 35 genes, respectively) or with 
another cancer (10, 19, and 15, respectively). In addition, 
the multi-factor analysis and data used in this study 
allowed the uncovering several novel associations 
between gene profiles and glioblastoma survival. Specifi- 
cally, 16, 4, and 10 previously unreported genes were 
associated with lifetime, overall, and progression-free 
survival, respectively in the present work. The discussion 




0 10 20 30 40 50 60 70 80 90 



Age (years) 

Figure 2 Probability of lifetime glioblastoma survival across 
age in females and males for protein kinase, C beta {Prkcb) 

Probability of glioblastoma survival across age for Females and 
Males with Low (25 th percentile) and High (75 th percentile) 
expression level of protein kinase, C beta (Prkcb). Consistent with 
the hazard ratio estimates for females (HR = 1.31) and males (HR = 
5.21), the probability of survival in individuals with high levels of 
Prkcb declines before than in individuals with lower levels of Prkcb. 
Due to the significant interaction between the expression of Prkcb 
and gender, the probability of survival for females with high level of 
the gene declines faster than the probability of survival for males 
with low level of the gene. 

of the findings from our study is divided into genes 
associated with multiple survival variables, genes asso- 
ciated with glioblastoma in a cohort-independent or 
cohort-dependent manner, and further investigation of 
complex associations. 

Pik3rl and £2/3 were associated with all three glio- 
blastoma survival variables (Tables 2, 3, 5 and 7). The 
higher glioblastoma hazards associated with higher levels 




Progression-free period (months) 



Figure 3 Progression-free probability across post-diagnosis 
months for neuroblastoma RAS viral (v-RAS) oncogene 
homolog (Nras) Progression-free probability across post-diagnosis 
months for individuals with Low (25 th percentile) and High (75 th 
percentile) expression level of neuroblastoma RAS viral (v-RAS) 
oncogene homolog (Nras). With a high hazard estimate (HR = 3.93), 
the progression-free probability falls faster in individuals with high 
expression levels of Nras. 
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Table 9 Gene Ontology categories enriched among the genes associated with lifetime glioblastoma survival 1 

Gene Level Term P- FDR Number Genes 

Ontology value adjusted P- of genes 

value 



Biological 3 
process 



aging (GQ0007568) 

regulation of biological 
process (GO:0050789) 

protein localization 
(GO:0008104) 

cell division (GO:0051301) 

cell cycle (GO:0007049) 

nitrogen compound 
metabolic process 
(GO:0006807) 

cell proliferation 
(GO:0008283) 
death (GO:00 16265) 



2.1 3E- 
05 



1 .62E-03 
1 .50E-02 



2.77E- 
04 

1.49E- 3.77E-02 

03 

2.04E- 3.88E-02 
03 

3.75E- 5.14E-02 
03 

4.06E- 5.14E-02 

03 

8.32E- 8.99E-02 
03 

9.46E- 8.99E-02 
03 



3 Pdcd4, Cdkn2a, Tp53 

20 Usf2, Cdkn2a, Ccnb2, Akt2, Tp53, Cdc42, Six6, Jag2, Lin7c, Pdcd4, 
Csfl, Topors, Spg21, Aktl, Egfr, SoxlO, C2, Scn5a, Arhgef4, Cdk2 

7 Topors, Aktl, Sarla, Egfr, Timm23, Tp53, Lin7c 

3 Ccnb2, Cdc42, Cdk2 

7 Pdcd4, Egfr, Cdk2, Cdkn2a, Ccnb2, Tp53, Jag2 

4 Chst4, Aktl, Egfr, CN311 

6 Csft, Topors, Egfr, Cdk2, Tp53, Jag2 

6 Aktl Cdkn2a, Tp53, Jag2, Pdcd4, Topors 



cell aging (GO:0007569) 4.38E- 9.42E-04 
06 



Pdcd4, Cdkn2a, Tp53 



1 Only GO categories with False Discovery Rate (FDR) adjusted P-value < 0.1 and represented by three or more genes. 



of Pik3rl observed in this study are supported by pre- 
vious work showing that over-expression of this gene 
plays a role in the activation of the PI3K/Akt pathway 
resulting in cell proliferation and tumor invasion [45]. 
Likewise, a link between E2f3 and glioblastoma has 
been reported [28,46]. Among the 15 genes associated 
with two glioblastoma events (Table 2), Akrlc3, Csfl, 
Jag2, Plcgl, Rpl37a, Sod2, and Topors were associated 
with lifetime and overall survival (Tables 3 and 5). Jag2 
has been associated with adenomas [47], pancreatic 
[48] and breast cancer [49], Rpl37a with nasopharyn- 
geal carcinoma cell lines [50], and the rest with glio- 
blastoma [26,51-54]. The consistent findings across 
both glioblastoma survival events suggest that these 
genes may have specific roles in death. Likewise, the 
association between Hras and overall and progression- 
free survival (Tables 2, 5 and 7), is consistent with pre- 
vious glioblastoma studies [55] and suggests that this 
gene may have a role in aggressive glioblastoma 
growth. Fstll, Mtap, Tp53, Camk2g 214322_at, and 
Ill3ral probe 210904_s_at, were associated with life- 
time and progression-free survival (Tables 2, 3 and 7) 



and these associations are supported by previous stu- 
dies [20,24,25,28,56-58]. 

Most genes (lifetime survival, 55 out of 61 genes; over- 
all survival, 45 out of 47 genes; and progression-free 
survival, 55 out of 60 genes) were associated with survi- 
val in a general or cohort-independent manner. The 
most extreme cohort-independent changes in lifetime 
survival were observed in Synel (HR = 0.17) and Pdcd4 
(HR = 4.68), and the former profile has been found in 
lung [59], ovarian [60], colon, and breast cancers [61]; 
while, the second has been associated with glioma [62]. 
The most extreme cohort-independent changes in over- 
all survival were observed in Ighgl (HR = 4.33) and Tgfa 
(HR = 0.12), and the former trend has been found in 
cancer cell lines [63]; meanwhile the later is present in 
the KEGG glioma pathway [28]. Lastly, the genes that 
presented extreme hazard ratio values and general asso- 
ciation with progression-free survival are Pla2g7 (HR = 
0.11) and Paics (HR = 5.28). The Pla2g7 and Paics 
trends identified in this study are consistent with those 
reported for breast cancer in mice [64] and in non- 
glioma types of cancer [65,66], respectively. 



Table 10 Gene Ontology categories enriched among the genes associated with overall glioblastoma survival 

Gene Level Term P- FDR Adjusted Number of Genes 

Ontology value P-value genes 

Biological 4 anatomical structure morphogenesis 5.39E- 1.16E-02 9 Nkx2-5, Csfl , Mapk3, Tgfa, Thbs4, Jag2, 

process (GQ0009653) 05 Igf2bp3, Myo7a, Hras 

6 organ morphogenesis (GO:0009887) 3.83E- 2.40E-03 7 Nkx2-5, Csfl, Mapk3, Tgfa, Jag2, Myo7a, Hras 

06 



1 Only GO categories with False Discovery Rate (FDR) adjusted P-value < 0.1 and represented by three or more genes. 
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Table 1 1 Gene Ontology categories enriched among the genes associated with progression-free survival 



Ontology 






P~VdlU€ 


FDR AHinctpH 

P-value 


Mumhpr nf 

l\ Ul 1 IUCI Ul 

genes 




R i /~\ \r\rA i r~zz\ 
DIUIUy ILdl 

process 




rpll rvHp rnO'nnn7n4Qi 

ceil cycie luu.uuu/ u*+yy 


J,07L UO 


z.yoc ut 


] ] 


Hrnc PnnlrlSn Ann Cnlml AtfS Ptpn F?fl 

Wee/, Tp53, Ccndl, Nras 






death (GO:00 16265) 


3.18E-04 


1.21E-02 


9 2 


App, Rafl, Atf5, Pten, Fadd, Hspala/Hspalb, 
Tp53, Ppplrl5a 






response to biotic stimulus 
(GO:0009607) 


4.11E-03 


5.13E-02 


6 2 


Fadd, Hspala/Hspalb, Clec2b, Ccndl, Ifngrl 






response to abiotic stimulus 
(GO:0009628) 


8.17E-03 


6.90E-02 


6 2 


Fadd, Hspala/Hspalb, Clec2b, Ccndl, Ifngrl 




4 


cell cycle process (GO:0022402) 


4.26E-06 


9.16E-04 


10 


App, AtfS, Pten, E2f3, Weel, TpS3, Ccndl, Nras, 
Hras, PpplrlSa 




5 


regulation of cell cycle (GO:0051726) 


2.90E-07 


1 .22E-04 


10 


Weel, Tp53, Ccndl, App, Nras, Hras, PpplrlSa, 
AtfS, Pten, E2f3 




5 


rpni i hatir~in r\v r^rnnrf^ccinn tnrni inn 
icyuiaLiuii ui kJi i cj jiui i Linuuyii Ltri i 

cycle (GO:0000074) 


1 47E-07 


9 24E-05 


1 0 


TnS^ Crnril Nrns Hrns PnnlrlSn Ann AtfS 

Pten, E2f3, Weel 






cell death (GO:0008219) 


2.13E-04 


6.18E-02 


9 2 


PpplrlSa, App, Rafl, AtfS, Pten, Fadd, Hspala/ 
Hspalb, Tp53 




7 


programmed cell death (GO:0012501) 


1.19E-04 


2.23E-02 


9 2 


PpplrlSa, App, Rafl, AtfS, Pten, Fadd, Hspala/ 
Hspalb, Tp53 




8 


apoptosis (GO:0006915) 


1 .41 E-04 


5.21 E-02 


9 2 


PpplrlSa, App, Rafl, AtfS, Pten, Fadd, Hspala/ 
Hspalb, Tp53 


Molecular 
function 


3 


pattern binding (GO:0001871) 


3.26E-04 


3.55E-02 


3 


Fstll, App, Fgfr2 






carbohydrate binding (GQ0030246) 


1 .26E-03 


6.89E-02 


4 


Fstlh App, Fgfr2, Clec2b 




4 


polysaccharide binding (GO:0030247) 


3.01 E-04 


8.70E-02 


3 


Fstll, App, Fgfr2 



1 Only GO categories with False Discovery Rate (FDR) adjusted P-value < 0.1 and represented by three or more genes; 

2 Although Hspala/Hspalb are represented by the same probe (202581_at), these isoforms are counted as two units. 



Several genes (lifetime survival, 6 out of 61 genes; 
overall survival, 2 out of 47 genes; and progression-free 
survival, 5 out of 60 genes) were associated with glio- 
blastoma survival in a cohort-dependent manner. These 
findings indicate that effective use of these genes in 
prognostic indices or in therapy development must con- 
sider the personal characteristics of the individual. 
Higher levels of C2 and Prkcb (probe 209685_s_at) were 
associated with a higher lifetime death hazard in males 
(HR = 1.93 and 5.22, respectively) than in females (HR 
= 1.30 and 1.31, respectively) and the profile of the lat- 
ter gene has been observed in colon cancer cell lines 
[67]. The lifetime hazard estimate decreased with 
increased levels of Sox 10 in Caucasian individuals (HR = 
0.55) compared to non-Caucasian individuals, and this 
pattern is concordant with broad distribution of SoxlO 
in high grade gliomas [68]. Increases in the level of 
Chi3ll were associated with significant increases in life- 
time hazard estimates across all therapies with the high- 
est hazard ratio observed in individuals receiving no 
therapy (None, HR = 2.42). This trend is consistent with 
reports that Chi3ll/Ykl-40 was highly overexpressed in 
glioblastoma relative to nonneoplastic brain [69] and 
that Ykl-40 is associated with poorer response to radia- 
tion and shorter lifetime survival in glioblastoma [70]. 
Males (HR = 0.36) and individuals receiving no therapy 



(HR = 0.38) have the lowest hazard ratio per increase in 
Prkcb (probe 207957_s_at). These trends are consistent 
with those reported for other cancer types [67] and with 
observations of protein kinase C activation in gamma- 
irradiated proliferating and confluent human lung fibro- 
blast cells [71]. 

The cohort-dependent associations between overall 
survival and both Polr2d and Igf2bp3 have been 
observed in colorectal cancer [72] and glioblastoma [73], 
respectively. Three genes (Rab31, Rps20 and Apool) 
exhibited a cohort-dependent association with overall 
survival that is consistent with previously reported 
trends [74-76]. Lastly, the gender-dependent association 
between GdflO and progression-free survival is in agree- 
ment with reports of copy number loss of GdflO in 
mesothelioma [77]. 

Further analyses of the association between individual 
genes (with or without clinical variables) and hazards 
were undertaken when the trend estimated from the 
multi-gene index was opposite to that previously 
reported. Nine genes and survival events were re-ana- 
lyzed individually and compared to previous reports 
including: E2f3 and all three survival variables [28,46], 
Egfr and lifetime survival [78], Cfsl and lifetime survival, 
Mdm2 with overall hazard [79], Fstll and lifetime and 
progression-free survival [25], Mtap and progression- 
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Figure 4 Gene network from the functional analysis of lifetime glioblastoma survival Interaction between the significant genes from the 

functional analysis of lifetime glioblastoma death. The gold edges represent protein interactions whereas the red edges represent interaction of 

the HNF4A protein with the DNA of the genes Pdcd4, SoxlO and Timm23. Of the 24 genes from Table 9, 18 (pink nodes) interact among each 

other in a direct way or through an intermediate gene (blue nodes). 
. J 



free hazard [57], Pdcd4 and lifetime survival [62], Tgfa 
and overall survival [80], and race-dependent Vav3 and 
overall survival [81]. In the first six cases, the considera- 
tion of the gene alone as predictor of glioblastoma survi- 
val as standard in previous reports resulted in non- 
significant associations, in this study. These results indi- 
cate that the accurate identification of biomarkers and 
precise characterization of the trend requires the study 
of the genes in concert with other genes in a systems 
biology framework, such as the approach implemented, 
in this study. Re-analysis of Pdcd4 and Vav3 confirmed 
the significant trend detected in the multi-gene analysis, 
suggesting that further studies are needed to precisely 
characterize the trend. 

The LOOCV confirmed the adequacy of the set of 
genes and clinical variables identified to predict the glio- 
blastoma hazards. The minor differences between the 
observed and predicted numbers in each group may be 
due to the discretization of the survival length into high 
and low groups required by the discriminant analysis; 



whereas, the Cox survival analysis models continuous 
time to the glioblastoma event. The significant number 
of genes prognostic of glioblastoma survival identified in 
this study that are also targets of microRNAs associated 
with glioblastoma [43] further confirms our results. 

In addition to literature review and LOOCV, the gene- 
survival associations detected in this study were con- 
firmed using the information from the REMBRANDT 
database. The associations between survival and the 10 
gene probes with the most extreme hazard ratio esti- 
mate for each of the three survival variables studied that 
did not interact with cohort variables (Tables 3, 5, 7) 
were investigated in REMBRANDT. The query was per- 
formed using the Kaplan-Meier survival plot for Gene 
Expression Data. Of these, eight genes had the same sig- 
nificant trend observed in our study {Synel, Gigyf2l 
TnrclS, ScnSa, HoxalO, Pdcd4, Tgfa, Pla2g7 and 
Agpatl), two did not have information on the 
REMBRANDT database (Ighgl and Hnrnpd), Fstll had 
an opposite trend than the one observed in our study 
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MAPK3 




Figure 5 Gene network from the functional analysis of overall 
glioblastoma survival. Relationship between the significant genes 
from the functional analysis of overall survival. Of the nine genes 
from Table 10, four (pink nodes) interact among each other in a 
direct way or through an intermediate gene (blue nodes). 



and in previous independent studies (Table 3) and most 
of the remaining genes, although non-significant, had 
the same trend observed in our analysis. The latter 
results are consistent with the simpler analytical 
approach based on Kaplan-Meier curves available in 
REMBRANDT, when compared to the more flexible 
Cox survival analysis used in our study. The Kaplan- 
Meier approach relies on non-parametric rank-based 
test to compare the survival between individuals with 
high and low gene expression. These groups are 
obtained by setting up an arbitrary expression threshold. 
Non-parametric rank-based approaches tend to have 
lower power to detect significant variation than semi- 
and parametric approaches such as the Cox survival 
analysis. In addition, the Kaplan-Meier analysis only 
allows the consideration of one explanatory variable at a 
time, and this variable has to be discrete (thus, the rea- 
son for comparing high and low expression groups in 
REMBRANDT). This approach does not allow consider- 
ing multiple continuous covariates (i.e. gene expression) 



and factors (e.g. race, gender, therapy and progression) 
or interactions simultaneously. The Cox-survival analysis 
implemented in our study allows the simultaneous con- 
sideration of multiple factors (such as possible popula- 
tion stratification due to race), covariates (e.g. other 
gene expression profiles) and interactions, and it does 
not require the discretization of the gene expression 
values that could result in potential loss of information. 
Thus, the Cox approach used in our study is able to 
capture the association between continuous gene 
expression values and survival conditional on all other 
model terms and is able to detect associations that are 
likely not to reach statistical significance using the 
Kaplan-Meier comparison of survival between high and 
low gene expression groups. 

Among the GO categories, 19 biological processes and 
three molecular functions were over-represented (FDR 
adjusted P-value < 0.1, > 3 genes per category) in the 
genes associated with the three glioblastoma events stu- 
died (Tables 9, 10 and 11). Two biological processes, 
cell cycle (GO:0007049) and death (GO:0016265), were 
over-represented in the lifetime and progression-free 
survival (Tables 9 and 11), and several biological pro- 
cesses have been previously associated with glioblastoma 
[17,62,68,70,79,82-86]. These processes included: aging, 
morphogenesis, cell cycle and proliferation, and death 
for lifetime survival; morphogenesis for overall survival; 
and cell cycle, death and recognition, death, response to 
biotic and abiotic stimuli, programmed cell death, and 
apoptosis for progression-free survival. 

The study of complementary glioblastoma survival vari- 
ables allowed to confirm that the gene profiles associated 
with lifetime survival resulting in the enriched functional 
category of aging are clearly associated with cancer initia- 
tion and progression and are not a simply reflection of the 
natural aging process. Two results confirm that the bio- 
markers are not mere confounding with aging. First, the 
genes in the GO terms "aging (GO:0007568)" and "cell 
aging (GO:0007569)", Pdcd4, Cdkn2a, and Tp53, have all 
been associated with GBM in previous independent stu- 
dies (Table 3). In addition, Tp53 was associated with pro- 
gression-free survival (Table 7). Second, other functional 
terms enriched among the genes associated with lifetime 
glioblastoma survival were also identified on the other 
glioblastoma survival variables studied. The biological pro- 
cesses of cell death and cell cycle were enriched both for 
lifetime and progression-free survival. 

The biological processes, molecular functions and 
gene networks particular to a glioblastoma survival 
event offered insights into the processes particular to 
the initiation and progression of this cancer. For 
instance, eight biological processes associated with life- 
time survival were level 3, and one was level 4, indicat- 
ing that the differentially expressed genes associated 
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Figure 6 Gene network from the functional analysis of progression-free survival. Relationship between the significant genes from the 
functional analysis of progression-free survival. Of the 19 genes from Table 11,17 (pink nodes) interact among each other in a direct way or 
through an intermediate gene (blue nodes). 



with lifetime survival participate on broad or general 
biological mechanisms. The interconnection between 
the genes pertaining to aging further confirms the sig- 
nificance of this gene network on lifetime survival (Fig- 
ure 4). Although only two biological processes were 
associated with overall survival, these processes corre- 
spond to levels 4 and 6. This result indicates that the 
genes associated with overall survival correspond to 
more specific mechanisms. Moreover, both biological 
processes are related to generation and organization of 
anatomical structures, such as organs, and this finding 
may be associated to the dispersion and development of 
malignant cells after diagnosis and resection. The close 
relationship between biomarker genes in this network 
supports this finding (Figure 5). Albeit the study of pro- 
gression-free survival encompassed a shorter period 
than lifetime and overall survival, the functional analysis 
showed several biological processes and molecular func- 
tions over-represented among the genes associated with 
this survival. Four of the biological processes are from 
level 6 to 8, indicating that specific gene networks and 



roles are associated with progression-free survival. The 
biological processes associated with progression-free 
survival include regulation of progression through cell 
cycle, programmed cell death, and apoptosis. Extensive 
relationships between the biomarker genes in the cell 
cycle were identified further, supporting the major role 
of this network on glioblastoma progression (Figure 6). 
In addition, three molecular functions were enriched 
among the genes associated with progression-free survi- 
val. Therefore, many biological and molecular events 
occur in the period between the diagnosis of malignancy 
and progression or recurrence, probably due to response 
to numerous treatments, surgery, and cancer progres- 
sion. Two genes were highly represented across the 
categories (Tables 9 to 11). Tp53 has an important role 
as a tumor repressor [83], and App is highly expressed 
in individuals with short-term glioblastoma survival [24]. 

Conclusions 

An innovative approach to identify simultaneously mul- 
tiple biomarkers of lifetime, overall and progression-free 
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glioblastoma survival in a systems biology framework 
was presented. Furthermore, the inclusion of clinical 
information allowed the uncovering of general and indi- 
vidualized associations between gene expression profiles 
and three complementary survival metrics. This study 
demonstrated the pre-eminence of developing multi- 
gene prognostic indices of glioblastoma survival through 
the integration of variable selection and survival models 
relative to the simple-yet- simplistic single-gene analysis. 
Known biomarker gene profiles were confirmed, and 
new general and clinical-dependent gene profiles were 
uncovered. The present study looked at glioblastoma in 
general and complements work on the identification of 
genes associated with specific glioblastoma types 
[42,87,88]. Empirically confirmed findings will be the 
basis for improved prognostic tools and individualized 
treatments that improve the survival and quality of life 
of individuals suffering glioblastoma multiforme. 

Additional material 



Additional file 1: List of genes associated in glioblastoma from the 

MteratureTable containing the list of 174 genes previously reported in 
the literature. 
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